A Light Sliding-Window Part-of-Speech Tagger for the Apertium Free/Open-Source Machine Translation Platform

نویسندگان

  • Gang Chen
  • Mikel L. Forcada
چکیده

This paper describes a free/open-source implementation of the light sliding-window (LSW) part-of-speech tagger for the Apertium free/open-source machine translation platform. Firstly, the mechanism and training process of the tagger are reviewed, and a new method for incorporating linguistic rules is proposed. Secondly, experiments are conducted to compare the performances of the tagger under different window settings, with or without Apertium-style “forbid” rules, with or without Constraint Grammar, and also with respect to the traditional HMM tagger in Apertium.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Prague Bulletin of Mathematical Linguistics Free/open-source Resources in the Apertium Platform for Machine Translation Research and Development

This paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats. These reso...

متن کامل

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development

This paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-basedmachine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats. These resou...

متن کامل

Reuse of Free Resources in Machine Translation between Nynorsk and Bokmål

We describe the development of a two-way shallow-transfer machine translation system between Norwegian Nynorsk and Norwegian Bokmål built on the Apertium platform, using the Free and Open Source resources Norsk Ordbank and the Oslo–Bergen Constraint Grammar tagger. We detail the integration of these and other resources in the system along with the construction of the lexical and structural tran...

متن کامل

apertium-cy - a collaboratively-developed free RBMT system for Welsh to English

apertium-cy (http://www.cymraeg.org.uk) is a rule-based “gisting” machine translation system forWelsh to English, with both engine and data released under the GPL.We summarise the development of apertium-cy, evaluate its output, and discuss the advantages of a collaborative development model combined with rule-based MT for marginalised languages. 1. e Apertium platform apertium-cy is a “gistin...

متن کامل

Training Part-of-Speech Taggers to build Machine Translation Systems for Less-Resourced Language Pairs

In this paper we review an unsupervised method that can be used to train the hidden-Markov-model-based part-of-speech taggers used within the opensource shallow-transfer machine translation (MT) engine Apertium. This method uses the remaining modules of the MT engine and a target language model to obtain part-of-speech taggers that are then used within the Apertium MT engine in order to produce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1509.05517  شماره 

صفحات  -

تاریخ انتشار 2015